18 research outputs found
Balanced Training of Energy-Based Models with Adaptive Flow Sampling
Energy-based models (EBMs) are versatile density estimation models that
directly parameterize an unnormalized log density. Although very flexible, EBMs
lack a specified normalization constant of the model, making the likelihood of
the model computationally intractable. Several approximate samplers and
variational inference techniques have been proposed to estimate the likelihood
gradients for training. These techniques have shown promising results in
generating samples, but little attention has been paid to the statistical
accuracy of the estimated density, such as determining the relative importance
of different classes in a dataset. In this work, we propose a new maximum
likelihood training algorithm for EBMs that uses a different type of generative
model, normalizing flows (NF), which have recently been proposed to facilitate
sampling. Our method fits an NF to an EBM during training so that an
NF-assisted sampling scheme provides an accurate gradient for the EBMs at all
times, ultimately leading to a fast sampler for generating new data
A Deterministic and Generalized Framework for Unsupervised Learning with Restricted Boltzmann Machines
Restricted Boltzmann machines (RBMs) are energy-based neural-networks which
are commonly used as the building blocks for deep architectures neural
architectures. In this work, we derive a deterministic framework for the
training, evaluation, and use of RBMs based upon the Thouless-Anderson-Palmer
(TAP) mean-field approximation of widely-connected systems with weak
interactions coming from spin-glass theory. While the TAP approach has been
extensively studied for fully-visible binary spin systems, our construction is
generalized to latent-variable models, as well as to arbitrarily distributed
real-valued spin systems with bounded support. In our numerical experiments, we
demonstrate the effective deterministic training of our proposed models and are
able to show interesting features of unsupervised learning which could not be
directly observed with sampling. Additionally, we demonstrate how to utilize
our TAP-based framework for leveraging trained RBMs as joint priors in
denoising problems
Inferring Sparsity: Compressed Sensing using Generalized Restricted Boltzmann Machines
In this work, we consider compressed sensing reconstruction from
measurements of -sparse structured signals which do not possess a writable
correlation model. Assuming that a generative statistical model, such as a
Boltzmann machine, can be trained in an unsupervised manner on example signals,
we demonstrate how this signal model can be used within a Bayesian framework of
signal reconstruction. By deriving a message-passing inference for general
distribution restricted Boltzmann machines, we are able to integrate these
inferred signal models into approximate message passing for compressed sensing
reconstruction. Finally, we show for the MNIST dataset that this approach can
be very effective, even for .Comment: IEEE Information Theory Workshop, 201
Neural networks: from the perceptron to deep nets
Artificial networks have been studied through the prism of statistical
mechanics as disordered systems since the 80s, starting from the simple models
of Hopfield's associative memory and the single-neuron perceptron classifier.
Assuming data is generated by a teacher model, asymptotic generalisation
predictions were originally derived using the replica method and the online
learning dynamics has been described in the large system limit. In this
chapter, we review the key original ideas of this literature along with their
heritage in the ongoing quest to understand the efficiency of modern deep
learning algorithms. One goal of current and future research is to characterize
the bias of the learning algorithms toward well-generalising minima in a
complex overparametrized loss landscapes with many solutions perfectly
interpolating the training data. Works on perceptrons, two-layer committee
machines and kernel-like learning machines shed light on these benefits of
overparametrization. Another goal is to understand the advantage of depth while
models now commonly feature tens or hundreds of layers. If replica computations
apparently fall short in describing general deep neural networks learning,
studies of simplified linear or untrained models, as well as the derivation of
scaling laws provide the first elements of answers.Comment: Contribution to the book Spin Glass Theory and Far Beyond: Replica
Symmetry Breaking after 40 Years; Chap. 2
Modern applications of machine learning in quantum sciences
In these Lecture Notes, we provide a comprehensive introduction to the most
recent advances in the application of machine learning methods in quantum
sciences. We cover the use of deep learning and kernel methods in supervised,
unsupervised, and reinforcement learning algorithms for phase classification,
representation of many-body quantum states, quantum feedback control, and
quantum circuits optimization. Moreover, we introduce and discuss more
specialized topics such as differentiable programming, generative models,
statistical approach to machine learning, and quantum machine learning.Comment: 268 pages, 87 figures. Comments and feedback are very welcome.
Figures and tex files are available at
https://github.com/Shmoo137/Lecture-Note
Éléments de compréhension des réseaux de neurones pour l’apprentissage automatique par méthodes de champ moyen
Machine learning algorithms relying on deep new networks recently allowed a great leap forward in artificial intelligence. Despite the popularity of their applications, the efficiency of these algorithms remains largely unexplained from a theoretical point of view. The mathematical descriptions of learning problems involves very large collections of interacting random variables, difficult to handle analytically as well as numerically. This complexity is precisely the object of study of statistical physics. Its mission, originally pointed towards natural systems, is to understand how macroscopic behaviors arise from microscopic laws. In this thesis we propose to take advantage of the recent progress in mean-field methods from statistical physics to derive relevant approximations in this context. We exploit the equivalences and complementarities of message passing algorithms, high-temperature expansions and the replica method. Following this strategy we make practical contributions for the unsupervised learning of Boltzmann machines. We also make theoretical contributions considering the teacher-student paradigm to model supervised learning problems. We develop a framework to characterize the evolution of information during training in these model. Additionally, we propose a research direction to generalize the analysis of Bayesian learning in shallow neural networks to their deep counterparts.Les algorithmes d’apprentissage automatique utilisant des réseaux de neurones profonds ont récemment révolutionné l'intelligence artificielle. Malgré l'engouement suscité par leurs diverses applications, les excellentes performances de ces algorithmes demeurent largement inexpliquées sur le plan théorique. Ces problèmes d'apprentissage sont décrits mathématiquement par de très grands ensembles de variables en interaction, difficiles à manipuler aussi bien analytiquement que numériquement. Cette multitude est précisément le champ d'étude de la physique statistique qui s'attelle à comprendre, originellement dans les systèmes naturels, comment rendre compte des comportements macroscopiques à partir de cette complexité microscopique. Dans cette thèse nous nous proposons de mettre à profit les progrès récents des méthodes de champ moyen de la physique statistique des systèmes désordonnés pour dériver des approximations pertinentes dans ce contexte. Nous nous appuyons sur les équivalences et les complémentarités entre les algorithmes de passage de message, les développements haute température et la méthode des répliques. Cette stratégie nous mène d'une part à des contributions pratiques pour l'apprentissage non supervisé des machines de Boltzmann. Elle nous permet d'autre part de contribuer à des réflexions théoriques en considérant le paradigme du professeur-étudiant pour modéliser des situations d'apprentissage. Nous développons une méthode pour caractériser dans ces modèles l'évolution de l'information au cours de l’entraînement, et nous proposons une direction de recherche afin de généraliser l'étude de l'apprentissage bayésien des réseaux de neurones à une couche aux réseaux de neurones profonds
Optimizing Markov Chain Monte Carlo Convergence with Normalizing Flows and Gibbs Sampling
International audienceGenerative models have started to integrate into the scientific computing toolkit. One notable instance of this integration is the utilization of normalizing flows (NF) in the development of sampling and variational inference algorithms. This work introduces a novel algorithm, GflowMC, which relies on a Metropolis-within-Gibbs framework within the latent space of NFs. This approach addresses the challenge of vanishing acceptance probabilities often encountered when using NF-generated independent proposals, while retaining non-local updates, enhancing its suitability for sampling multi-modal distributions. We assess GflowMC's performance concentrating on the ϕ 4 model from statistical mechanics. Our results demonstrate that by identifying an optimal size for partial updates, convergence of the Markov Chain Monte Carlo (MCMC) can be achieved faster than with full updates. Additionally, we explore the adaptability of GflowMC for biasing proposals towards increasing the update frequency of critical coordinates, such as coordinates highly correlated to mode switching in multi-modal targets